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c The evaluation of bilingual education programs 
complicated by such factors as the diversity of evaluation 
methodologies and program goals and the reliability of instruments 
for minority language students. Three bilingual program evaluations 
in foreign countries are described in terms of their different 
contexts and approaches in "order to raise issues about bilingual 
education program evaluation. The programs evaluated were the St. 
Lambert French immersion program iji Canada, the Yoruba 6-year primary 
project in Nigeria, and the local language literacy training project 
in the' southern Sudan. Based on these evaluation experiences, the- 
strengths and weaknesses of quantitative and qualitative evaluation 
methods are discussed. A combination of quantitative and qualitative 
evaluation methods is suggested as a meaifs of maximizing the 
strengths of each approach. However, it is important that such a 
Combined approach be carefully designed. (RW) 
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APPROACHES TO THE EVALUATION OF BfL I NGUAL EDUCATION: .AN INTERNATIONAL 
PERSPECTIVE 

Gary A.' Cziko ^ " ' ' 

* 

Introduction 

The problem of evaluating bilingual education programs in the U.S. 
is an exceedingly complex one. First, there is the problem of choosing 
an appropriate methodology for conducting evaluations which mu^t take 
into account the debate between quantitative and qualitative approaches 
»to evaluation. Second, there !s a diversity of goals for bilingual, 
education in the U.S. with some programs attempting to transition 
students ir>to »a1 1 -Engl ish programs as quickly as possible whjle others 
attempt to maintain or restore knowledge of the students 1 first language 
and culture. • Third, one must take into account the possibility that 
measures commonly used to assess academic achievement, language 
proficiency, and attitudes may'not possess adequate reliability and/dr 
validity and may be seriously .biased against children who have not had 
much contact with the majority language and culture of this country. 
FobfTfi^studtent success within bilingual education programs appears to 
be influenced not only by the type and quality of their educational- 
program, but also by psychological and soclol Ingul stic factors operating 
within 'the context of the classroom,- school, and community. Finally, 
evaluators working yTthin the context of bilingual programs must have " 
adequate knowledge of the language and culture of the groups with which 
they are working. 

v 

. It Is no wonder, then, that there is so much debate, controversy, 
and disagreement concerning both the appropriate methodology for 
evaluating bilingual programs and the interpretation arrd implications of 
the many bilingual education program evaluations that have been 
conducted. To^adequately address all of these issues would require a 
major effort and .may well be beyond the ability of any one person; it is 
certainly beyond the scope of this paper. The objective of this paper 



Is cons ider ably* more modest. It will attempt to'bring an international 
perspective t.o bear on problems of concern tb bilingual education.* 
etaluators and educators In this country, . 

■ >f 

• In contrast to most fieldsjof education in which the U.S. is at the 
forefront in research, ^bi lingual education has been one area where 
American researchers have. spent considerable tlmp examining the success 
of other countrles.^wi th what is both a very old and a very new approach 
to educatir , Although I am now actively involved in bilingual 
education evalu^Tjon and research In thrfe U.S., I have been fortunate to 
have been involved In' the evaluatior^of a number of bilingual education 
programs outside this country. In this paper, I will briefly describe 
three bilingual education projects I have ^valuated-- one in Canada and 
two in # Africa — which are particularly interesting because they took 
place In very different contexts and were evaluated using quite 
different approaches, approaches reflecting both changes in the field of 
educational evaluation and my own development as an evaluator. I wi 1 1 
use these three projects and their evaluations to raise a number of 
basic issues relating to the evaluation of bilingual education programs. 
I will then make 'some tentative conclusions and suggestions concerning 
the evaluation of bilingual education In the U.S. n 




.Canada: French Immersion Proqrams 

_ a .. f 

One of the most influential evaluatipns of a'billngu^l educatior 
program has been that of . the French Immersion begun in 1965 in St. 
Lambert, a suburb of Montreal (Lambert and^Tucker, 1972). .The purpose 
of this program is to allow English-Canadian children to acquire 
functional, bi 1 Ingual Ism irv French and English. This Is done by teaching 
all subjects In French until Grade 2 or 3, at which time English 
language arts are introduced for the first time. While virtually all 
evaluations of the original St. Lambert program, as well as other French 
immersion programs throughout Canada, have been favorable, it is 



interesting to take a close look at the approach used to evaluate the 
original St. Lambert project. * 

m • 
The approach used by'Lamb.ert and Tucker (I972) % can perhaps ba 
described as an experimental, quantitative approach to educational 
evaluation. This is characterised by the- Selection of experimental and 
control gq^ups, attempts to equate the exper Imental^and control groups 

before the beginning of the program, the systematic administration of a 

• • > 

large number of measures # of academic achievement, 'language proficiency, 
language use, and attitudes, followed by^statistica.1 tests of < 
'significance between the experimental and control m groups. In shortfall 
data were quantified and all Judgments of ffie Impact dFTRe"program were^ 
based on statistical comparisons between the experimental and control 
groups. The only qualitative data to appear in ,the- or iginal 'evaluation 
appear as. an appendix in Lambert and. Tucker's book which briefly - 
describes the class activities of the Experimental group from 
kindergarten through Grade 5* 

In spite* of the fact that the* evaluation approach used by Lambert 
aod Tucker would^not be appropriate for the grwt majority of bilingual 
education ^pr.ograms in" the. U. S. , this and subsequent evaluations of 
French immersion programs in Canada have^been taken by many in this 
country as a standard for eyaluating bilingual education programs in the 
U.S.;, There appear to be a number of reasons for this. First, the 
genera) findings of the evaluation of the" original St. Lambert project- 
have been replicated many times throughout £anada (see Genesee, 1976)' 
and it t>as become 9 generally accepted* fact that these programs reliably, 
result in favorable outcomes .among English-Canadian children. It has 
also become quite clear that it is 'the French Immersion program itself, 
and not^other confounding factors, /which 'is responsible for the 
increased French-language proficiency of children participating in these 
programs. Also, the successful outcome of the French immersion programs 
has appeared to have raised relatively little opposition in Canada due 
to its voluntary nature, low cost, and acceptability to both French- and 



Engl ish-Canadi-ans ,al ike, . Oue to -these factors, French immersion 
programs haye enjoyed increasing popi/l.arity and can now be found in y 
every major city .and province throughout Canada. ^It should be rioted, \ 
however, that th is,mbsV prevalent fori* of bilingual education in Canada 
has been designed. for Engl i sh-speak^i ng- ch i l.dren, which form the majority 
linguistic group in Canada, and that minority groups which speak a 
language other fcftan French or'English enjoy no legal rights to the use 
of their nat Lve 1 anguage in public schools throughout Canada. The 
French immersion programs in v CanajJa have essentially shdwn that middle- 
class language majority children csn take a heavy dosfc of a' second 
language and learn it remarkably well wi£hout detVimental effects to* 



"their first-language development* This finding certainly has important 
educational implications south of the Canadian border, but not 
necessarily for linguFstic minority children in the U>S. 

While the results of the French immersion evaluations have served 
to extend bilingual education in Canada, these same results have been 
used repeatedly by critics of ULS. bilingual education efforts in -the 
U.S. as a rationale for eliminating bilingual education in this country. 
In spite of the fact that the Canadian researchers have repeatedly 
warned that their findings are not general izable to 1 ingui stlc minor ity 
children in the U.S., these evaluations have been repeatedly used as 
evidence that children can be immersed in a second la/iguage c** school 
with no ill effects to thel r J Ingulst ic and academic development (see 
Baker and deKanter, 1981; Epstein, 1977, PP» 53-5*0 • This has been done 
in spite of the fact that there 'are probably more differences than 
simi Iai*1 1 \6s between English-Canadian children in French Immersipn 
pr9grams and linguistic minority children, in the U.S. who are submerged 
into- all-English educatipnal programs. 

In addition to the danger of overgeneraliztng the Canadian 
_ evaluation results, another danger to. bilingual education -in the- U.S. 
lies in the experimental*, quantitative approach to evaluation used in 
the Canadian studies. Tnis approach to evaluation seems to be commonly 
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regarded as the onjy "scientific 11 way of demonstrating the effectiveness 
Of bilingual eduction programs, in-spite of, the fact that suchan 
approach is usually impossible to' apply to bilirigual education programs 
in the U.S., s.ince it is illegal to keep eligible children olit of 
biljngual programs to create control groups. Another problem with thi£ 
^ap^roach to evaluation is its tendency to rely heavily on quantitative 
outcome measures whi ] f e putting relatively little emphasLson describing 
the context of .the community, school, and classroom xp^whicTi the program 
takes place. Although i't is clearly important to.Weasure the-- outcomes 
of bi \ i ngual -educat ion programs, the' great variation in the way these 
programs are implemented makes the collection of outcome data of 1 i 1 1 Te 
practical use without a detailed description of the program and its 1 
content. We already know that there are both effective and ineffective 
bilingual educatidb program's in the U.S. (seeTroike,. 1978). What we 
n^ed to know are the factors which differentiate effective from 
ineffective programs. Evaluations which give little attention tp. what 
goes on in the classroom and to what the children, parents, teachers, 
^nd administrators think *and feel about the,program would not. appear to 
off er much useful information concerning why s6me programs are Effective 
and others are .less so. 

Nigeria; The Yoruba Six-Year Primary ■ Project 

Prior to 1970, both Yoruba (for BrJmary 1 through 3) and then 
Fngfish (for Primary 4 through- 6) were used as media of instruction in 
the six-year primary education Qf al] Yoruba-speaking. chi Idren in the 
western part of Ni^e^-ia. -The Yoruba Six-Year Primary Project (Afolayan, 
1976) was initiated^'in 1970 in dn attempt to devise a program which 
would make the primary education of these children more effective and 
meaningful by using Yoruba^as the sole medium of instruction for the 
T'irst six years, of school. To test the effects of the exclusive use of. 
•Yoruba as themedjum gf " instruct ion, a research project was initiated*. 4 
Experimental and control classes were set up at St,. Stephen 1 s- ''A" School 
in.lle-lfe. Early'in the implementation of the project, however, the 
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-prpject administrators found what they believed to be serious defects. in 
tfhe primary school curriculum and so took on the job of creating "a new * 
curr ieutunr incorporating the subjects cif Yoruba, English, science, 
social and. cultural studie'St^rtd^mathematics. Both the/^t. Stephen's 
experimental and coatrol groups have made use of this new curriculum* 
In addition a specialist teacher of English was us'ed to provide' Engl i sh 
instruction -for the experimental class'while the usual .classroom teacher 
provided- £vk}J ish instruction for the control .c lass;' . 

\ f * 

In 1973, t'He project was expanded to* include ten additional 



"proliferation 1 *: schools, e^ght of which wefre to use Yoruba as the so<le, 
medium of instruction (the proliferation group) whi l*e the -remaining two 
were to follow the usual pattern -%f three yea r-s^ of Yoruba foljowed by 
three years of English ( the pro] jf eriationtcontrol group). A.* * • 
comprehensive evaluation of the Yoruba Primary Project was initiated in 
1976 with the testing of>academi£ ach ievement^Yoruba- and Englisb- 
language skills, and intelligence of Primary 3 children £n tjte St, 
Stephen's and prol i f erat ion experimental and control tlasse$, as well as 
children fn selected traditional schools. This large-scale, # 
longitudinal evaluatri^n was to continue untH these children had ' 
completed primary school tt> determine the effect of using Yoruba as the 
exclusive medium of instruction, the impact of the new curr iculum 
materials, and the effectiveness of using a specialist teacher for the 
teaching^of English.' ' , \ 4 " ^ 

The evaluation design th^t evolved for, this project was primarily 
.^quantitative, relyf-ng; on a large number of tests that were administered 
to project and control schools both urban and rural settings and 
analysis of covariance to compare test~per formance contrqlling for 
differences! in socioeconomic status. In retrospect, it is in sortie ways 
. surprising that such an approach was used, since it involved the 
construction of a large number of .tests {Jew tests in the Yoruba 
language were available) and sophisticated data analysis techniques 
(e.g^ analysis of covariancef made possible only by the presence of 



modem computing facilities at tfie 'Universi ty of Ife. should be 

mentioned that initially the project staff felt that a ^systemat ic, 
quantitative«evaluation organized ^by an outsider was not necessary, 
since they were already personaHy convinced of the success*Qf the 
project. It was only after continued pressure from the Ford Foundation, 
which had providecf^sub^tant ial^ financial support for the project, that 
tfie staff agreed to evaluate /the project.) The evaluation of this, 
project produced fairly favorable* results, showing that foi* fostering 
academic achievement, Yoruba is as effective or more effective than 
English as a medium of. instruction throughout *al 1 six years of primajy 
•school in the Yorba-speaki ng apeas of western Nigeria (see Cjerinde, 
1 979) • However, as in the original evaluation of the ^St. Lambert French 
inversion progranj in„Montrea1, apparently no information has been> 
Collected on the content and form of classroom activities in either 
project or control .schools, and continued evaluation of the progress of 
participating students needs to be doners the^ continue through 
secondary school, where all instruction takes place in English. 

* » 

The Southern Sudan: The Local languages Literacy Training Project 

The Southern Regional Ministry of Education of the Sudan, \y( 
copperation wi.th the'Sunwer Institute of Linguistics, is currently 
involved in a comprehensive project designed to teach literacy skills to 
.elementary school pupils in the Southern Sudan, using 9 of the 53 or so 
local languages of the Jregion. The project involves the development, 
production, and dissemination of materials in .the local languages,*, as 
well as the training of teachers in the use of these material's. 

The problems encountered In developing aod implementing an 
evaluation of this project were considerable, due to the fact that the 
Southern Sudan is one of the poorest and least developed area^ of the 
world. Some of these probl£to>§ were the unavailability of testing 
instruments, the difficulty of commun icat i<5n and travel In the area, and 
difficulty iln locating educated native speakers 'of the .local languages 



10 



8 



to aid in test construction and data analysis. Nevertheless, the, first 
impact evaluation of the project took place in November/December 19^0, 
coinciding with the in-class use of trial editions of Primary 1, 2, and 
3 materials in Bari K Lotuho, Dinka, v and Ndogo. Of these four languages, 
Bari and Lotuho were selected, due to the relative accessibility of the 
Bari and Lotuho schools from Juba, the regional capital. During a visit 
to the region in June 1980, the author visited several rural 8ari and 
Lotuho primary schools, some using the new'material s, some not. Four of 
these ?crtbols -were selected for inclusion in the impact evaluation — a 
VBari School using the project's Bari materials for Primary 1 and 2, a 
comparison Bari school not using the project 1 ^ maferi al s, a Lotuho 
school using .the projects Lotuho materials for Primary 1 and 2, and a 
Lotuho school not* using the project's Lotuho materials. All four 
schools provided pt least basic necessary facilities, i.e., shelter, 
blackboards, chalk, paper and writing instruments^ 

Three general types of information were collected from the four 
schools. Fi^'st, general background information was collected on the 
size of each school (enrollment at e^ch grade), the curriculum (subjects 
taught at each grade, by whonv, using what materials), the teachers 
• (education, teaching experience, subjects taught), and the Primary 2 
pupils (name, mother tongute, age). Second, Information was collected on 
the actual teaching activities of the Primary 2 vernacular teacher at 
fc^ch school. These data Were obtained by tape recording and taking 
notes on a complete vernacular reading lesson which the author attended, 
assisted by an educated adult .speaker of the vernacular who was fluent 
in English and who was able to provide the author with explanations and 
interpretations of the class activities. Finally, Information was 
coll'ected on the actual reading performance of Primary 2 pupils in each 
of the four Primary 2 classes. This was obtained by adminstering a 
"group ftest of word recognition to each class and by tape recording 
performance on. individual t tests of oral reading and reading 
comprehension. The oral reading test consisted of four parts: (a) A 
list of ten words included in the project mater lal s; (b) a list of ten 



words not contained in the project material's} (c) a short story of 
approximately 50 words v containing all the words in the two lists; and^ 
(d)'five comprehension questions based on the story. Each pupil was 
asked to read aloud the lists and story and to-answer orally the 
questions pertaining to the story. .In "addition, eac\ pupil was asked to 
give his or her reasons for attending school and foe wanting to learn to 
nead. . y -a 

The outcome data collected via quantitative test instruments 
clearly showed that pupils in both the project and comparison classes 
were having difficulty learning tojread. For example, on the story 
reading test described above, the^roject pupils tested coyld only read 
a mean of \l% of all the word<> of the story. While this in itself is 
informative, it Is the process data collected in the reading classrooms- 
which give u<Tt:lues as to wh^ pupils were experiencing difficulties. 
Virtually all of the reading activities involved mechanical repetition 
or* recitation of letters, words, phrases, sentences, or stories either 
presented in the pupMs 1 materials or -printed on the board, with only a 
very few -instances of activities which required pupils. to attempt to • 
comprehend what the^ read. 

Also, the analysis of the errors made by some of the Sari project 
pupils in answering the class comprehension questions was reveal jng. 
Since many of the comprehension questions could be answered by simply 
repeating an appropriate .sentence or part of a sentence from the story, 
it was often not clear whether the pupils were actual ly .understanding 
the stories, or simply memorizing them from repeatedly hearing them read* 
aloud by, the teacher and by the class. However, three pup.il s during the 
Bari project literacy lesson began their answers to comprehension 
question with the word 5, which means roughly and then and is often used 
at the beginning of a non-initiai sentence of a Bari stony for the 
purpose of text cohesion. Answering # an oral question with a sentence 
beginning with a Is not appropriate (in fact the teacher vigorously, 
corrected these pupils) and seems to indicate that these pupils had in 
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faet memorized Jhe stQry and went so far as to violate some basic 
discourse rules of spoken Eari to use what they had memorized to answer 
the question. 

Fortunately, 'the informat Ion. obtained from an analysis of the 
reading materials, the classroom observation*?, and the reading test 
results have led to a number of recommendations concerning materials and 
•teaching techniques which, if followed, we feel will have a positive 
Lirpact on the acquisition of literacy skills in the vernacular languages 
as welJ as in English and Arabic (see Cowan, 1980; Cziko, in press). 
Future planned, evaluations of tfTe project will a! lew us to determine the 
feasibility and impact qf such changes oh the acquisition of literacy 
skills in the Southern Sudan. 

* « * 

Implications for the Evaluation of Bilingual Education Programs 

In-this final "section, 1 wM Kat tempt to draw implications from the 
experiences described above, focusing on tfie-distinction between 
quantitative and qualitative approachs to evaluation>n<Lthei r 
particular strengths and weaknesses when applied to evaluatmgri0^ in 9 ual 
education programs in the U.S. and abroad. This section will deal with 
(a), strengths and weaknesses of quantitative approaches to evaluation of 
bilingual education, (b) strengths and weaknesses of qualitative 
approaches to evaluation of bilingual education, and (c) a rationale for 
combining aspects of both quantitative and qualita^e evaluation 
approaches for evaluating bilingual education programs. 

Quantitative Evaluation Methods . Quantitative evaluation methods 
have a number of important . strengths, not the least of which is the 

r 

respect that quantitative, experimental research methods still enjoy 
among many educational researchers and educational policy planners. The 
primary advantage of this evaluation approach Is that, when properly 
followed, it can provide convincing evldepce that a particular bilingual 
education program is or is not having a measurable impact on 
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participating students. In addition, it permits isolation of the 
effects of the bilingual education program from the possiSle effects of 
other competing, potentially confounding factors. This approach also 
has the advantage of producing results which can be summarized fairly 
easily, a distinct advantage when a number of^different program 
evaluations are being considered to provide evidence for planning 
educational policy. However, the design requirements of a properly 
implemented quantitative evaluation, especially one following an 
experimental or quasi-experimental design, are such that this approach 
is seldom feasible for evaluating bilingual education programs in the 
U.S. 

These designs require the random ass i gnment of students to the 
experimental and control groups or, at the least, assurance that the fcfao 
group^jjo not differ systematically on any factors other than the 
differences in educational program which might influence the evaluation 
results. Anyone familiar wi.th bilingual education in the U.S« knows 
that random assignment of children who are eligible for bilingual 
education to bilingual and non-bilingual programs is usually not 
.possible and, In fact, would likely be in violation of both Federal and 
state regulations. .Also, eligible children who do not participate in 
bilingual education programs do not do so for two principal reasons — 
either their parents have decided, for whatever reason, that they^te not 
want their children to receive bilingual instruction, or the children 
are part of. a language group represented by less than 20 children ifl the 
school district,^ Therefore, these children would appear to differ in 
important and relevant ways from children of the same language group who 



Vor a summary of state legislation on the number of students 
necessary to "trigger" a bilingual education program in a district, see 
Gray, 1981. 
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* are receiving bilingual education, and would not appear to comprise a 

2 

proper control group* 

• • - i * 

In addition to the problem of* sel ect ing students for the bilingual 
and non-bilingual programs for the'purpose of conducting a quantitative 
evaluation, program variables other than the presence or absence of 
bilingual education may al so ^confound the evaluation results (see Baker 
and decanter's, 198 1 ^ criticism of the McConnell study). In fact, rn 

one analysis of the results of t:he Yoruba evaluation* described above, it 
> 

was fouQd that there were statistically significant differences on test rt 
performance among four classes which were all part of the same treatment 
group (Cziko and Ojerindef, 1976).- For these and other reasons, it is 
extremely difficult, if not impossible, to undertake a quantYtative * 
evaluation of a bhlingual education program using an evaluation design 
above reproach by either proponents or critics of bilingual education. 
It should not b£ surprising, therefore, that of the several- hundred 
p evaluations reviewed by"" Baker and deKanter (1981) only 28 were found to 
be free of serious methbdologica 1 problems typical of quantitative 
evaluations of bilingual education programs. 



2 ' 
It is interesting to note that failure to comply with the 

requirements of experimental or quasi -experimental evaluation designs 

has been the principal crtlcism of quantitative evaluations of bilingual 

.education in the U.S. by both critics and proponents of bilingual 

education. It seems that when evalutions find bi Ungual education to be 

ineffective-, proponents of bilingual education are quick to point out 

the non-equivalence of the bi Ungual ly educated and all-English educated 

groups (see, e.g., Gray's, 1978, criticism of the AIR evaluation by 

Oanoff (1978)) and, conversely, ^hen quantitative evaluations find 

bilingual education to be effective, critics of bilingual education are 

equally quick to point out that factors involved In the selection of 

children in the bilingual program and not the program itself may be 

Responsible for the results (see' Baker and deKanter, 1 98 1 ) . 
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"Quantitative evaluations of bilingual education also require a 
degree of consensus concerning the goals of bilingual education. This 
may nofpose a serious problem at state or Federal levels, since in the 
U.S. the primary objective of bilingual education is to facilitate the 
transition of. limited-English-proficient children into all-English 
program as quickly as*possibla. However, many bilingual program 
directors, teachers, parents, and students may well have other goals, 
e.g., the development of a high level of proficiency in the children's 
native language and/or ,the maintenance of certain features and knowledge' 
of their native culture. Therefore, quantitative evaluations designed 
with state and/pr Federal reviewers in mind may not provide information 
.relevant to the needs and concerns of people who are'closer to the 
program, in this respect it is interesting to note that the current 
system being used in Illinois to collect evaluative information on 
state-funded bilingual programs does not involve the collection and 
reporting of any data on nat ive- language skills (Illinois State Board of 
Education, 1981). In my experience I have found this rift between . 
"official" goals and actual goals at the" progrimHTeWr-to-exi st 
primarily in the U.S. , where bilingual education program directors and 
some bilingual tochers often have "maintenance" or "restoration ist" 
goals with respect to children's native languages, while official state, 
or Federal policy is oriented to the "transitional" goal of moving 
children from bilingual to all-English instruction as quickly as 
poss ib le. 

A final weakness of quantitative methods to evaluate bilingual 
education programs is the necessity of rei iable- and valid instruments to 
translate the behavior and feelings of those Involved in the programs 
into meaningful, useful numbers. Among the many problems related to - 
this requirement of a quantitative approach to evaluation are: (a) 
Deciding whether norm-referenced or criterion-referenced tests are 
appropriate (see Block, 1971; Ebel , 1971); (b) the possible bias of 
standardized tests for language mi nor 1 ty students .( see Olmedo, 1977); . 
and (c) Jocating or constructing measures of language prof.iciency which 
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are practica] to administer and yet^ take into account the various 

components of language skills which are n<?w believed to make up 

conmunicative competence (see Canale and Swain, 1980). These 'remain 

serious Issues in the U.S. . In* spite of the large amount of research 

undertaken to deal with them and the efforts of major publishers in the 

U.S. to produce reliable, valid and unbiased standardized tests of 

academic achievement. The reliance of quantitative evaluation 

approaches on reliable and valid measuring instruments means that this 

approach is of limited usefulness in settings where tests are either not 

available or where there are little or no resources available for the 

construction and validation of such tests. 

* * 

* * * 

Qualitative Evaluation Approaches . If quantitative approaches to 

the evaluation of bilingual education programs suffer from so many 
apparent weaknesses, might t not more qualitative (or naturalistic) 
evaluation methods be better suited to the task? While much has 
recently been written about the need for and strengths of qualitative , 
approaches to educationel evaluation (see Guba's, 1978, naturalistic 
inquiry; Stake's, 1975, 1978, responsive evaluation and case study 
approach; Patton's 1980, qualitative-evaluation methods), there seems to 
have been very little use of these qualitative evaluation approachs to 
examine bilingual education programs in the U.S. or abroad. This is not 
surprising in, the U.S. when one considers that Federal regulations 
require that evaluations of Title VI 1 bilingual education programs 
employ an evaluation approach based on test scores and appropriate 
statistical analyses to show that the bilingual program is having an $ 
impact on participating students 1 academic achievement (see Burry, 1979, 
p. 11). 

*• * 
Qualitative evaluation approaches do offer soipe appealing 
advantages over more quantitative methods, although they have weaknesses 
of their own. One important practical strength of a qualitative 
approach to evaluating bilingual education programs is that random 
assignment of students to bilingual and non-bilingual programs is not 

i 
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necessary, nor is it e\/en necessary to include a control or comparison 
g^oup in the evaluation design, since the primary purpose o£ a 
qualitative approach is to understand how the program works and how i*t 
is' viewed by students, parents, and *admini strators involved in the 
program. Of course, a comparison or control group may be included in 
the evaluation design and may well provide important evaluative , 
information, but the apparently unsolvable problem of random assignment 
or of assuring group A equi valence before the program begins is not a 
prime concern, J% 

Another strength of a qualitative approach to evaluation is that it 
requires that the evaluator obtain a -detailed, first-hand look at the 
program .in actibn, an experience not, required by more quantitative 
evaluation approaches. Thus, while I spent considerable time observing 
literacy clashes in the Southern Sudan and listening to and analyzing 
tape recordings of these classes, the classroom experience I obtained 
during the more quantitative evaluations of French Immersion programs in 
Canada and the Yoruba project in Nigeria was limited to supervising the 
administration of group tests and courtesy classroom visits which 
involved greeting the teacher and students and perhaps observing the 
class in action for a very short period of time. While in-depth class- 
observation may not be necessary if a program Is known to be well 
implemented and is doing*well by quantitative criteria (e.g., the French 
immersion programs in Canada), this type of information can be 
invaluable for recommending change if a program is not achieving its 
goals. Thus, while the reading test performance of children in project 
classes of the Sudan project clearly indicated that they were having 
considerable difficulty *in learning to read, the classroom" ot?servat ion 
gives an indication as to why this was the case. As mentioned earlier, 
it was found that there were only rare instances of classroom activities 
which actually retfufred some type of reading comprehension on the part 
of the students. This qualitative finding suggests obvious steps for 
the improvement of the project— steps that, would not be suggested by 
quantitative evaluation methods alone. 
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The final advantage of qualitative evaluation methods to be 
mentioned here is that this approach does not require the availability 
of reliable and^valld quantitative measures of the academic achievement, 
language proficiency, or attitudes of partic ipat \vTg students.* This i^ a 
particularly desirable feature in settings where such measures are 
available (and, as mentioned above, it could be argued that theyare not 
even available in the U.S.). 



Unfortunately, qualitative approaches to evaluation suffer from e 
number of weaknesses. One of these is the apparently subjective basis 
(using Scriven's, 1972, qualitative meaning of this word) on which 
success or failure 'of a bilingual program is judged. This seems to be a 
particular problem in the U.S. 'where most evaluators of Individual 
bilingual education programs appear to have pre-existing views of the 
worth and merit of bilingual education. It is imperative, therefore, 
"that evaluators using qualitative approaches provide strong support for 
their conclusions by citing supportive evidence from as many different 
sources as possible. While this should be a particular concern of 
evaluators using qualitative evaluation approaches, it appears to be a 
widely unrecognized problem of quantitative approaches as well. 

While the use of tests and statistical analyses appears' on the 
surface to be more objective (and hence reliable), statistical tests in 
themselves do not tall us whether the difference found in performance on 
a certain measure between bilinguall/y and monl ingual ly. # instructed groups 
is in fact- meaningfully significant. This Is because Inferential 
.statistical tests are Influenced by the size of the groups included ir. 
the evaluation, so that ;the same small difference between group means, 
which has no statistical significance when comparing small groups of 
students, may In fact be statistically significant when comparing larger 
groups. Unfortunately, few evaluators seem to take this Fact into 
account and often consider all statistically, significant differences to 
be practically significant (see Popham, 1975, p. 239). This obscures 
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the fact that all conclusions regarding the 'impact of a bilingual 
program, Whether measured quantitatively or qualitatively, are' 
essentially based on subjective judgments. 0 

) * 

Another feature of qualitative approaches to evaluating bilingual 
pograms that can be/considered a drawback within certain contexts Is 
that the data which are collected (e.g., classroom observations,, ^pen^ 
ended interviews with students/ col lect ions o % f students 1 written 
compositions) are usually dif f icul t^to-reduce and summarize* .While this 
same feature permits the diligent eval uation reader to get -a good idea 
of a particular progam in operation, it makes it extremely difficult for 
policy-makers to- review a large number of qualitative evaluations for' 
the purpose of giving an empirical base to policy decisions. For this 
reason it might be argued that while qualitative eval uat Tons are more 
appropriate for , obtaining and disseminating detailed information about a 
particular program and £pr providing information for program changes, 
quantitative evaluation approaches are more appropriate for the 
synthesis of a large amount of data obtained from a large base for the 
purpose of making policy decisions. Consistent with this observation is 
the fact that both the AIR evaluation (Oanof f , . 1978) and the recent 
review of bilingual evaluations prepared for the Office of Planning and 
Budget of the Department of^Wucation (Baker and deKanter, 1981 ) were 
concerned exclusively with quantitative evaluation methods and criteria. 
It would appear, then*, that evaluators using qualitative approaches to 
evaluating bilingual education programs run the risk of* having their 
work ignored by reviewers and policy planners. 

\ Finally, it must be mentioned. that the use of qualitative 
evaluation techniques typical ly demand a very large amount of time and 
effort for data collection, analysis, and report writing. Also, 
qualitative methods are difficult, if not impossUble, to employ in 
situations Where the evaluator does not have a thorough knowledge of the 
language and culture of the^hooTsetting unless trainable research 
assistants' can be found. , j 
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Combining Quant i tat ive* and Qualitative Approaches to Evaluation 

After having considered some of the strengths and weaknesses of 
both quantitative and qualitative approaches to the evaluation of 
bilingual education programs, it would appear that an effective 
evaluation approach would be to combine aspects of both approaches in 
order to minimi.ze the weaknesses and maximize the strengths of the 
overall evaluation methodology. Although this is certainly not a new 
proposal, it is a fairly recent one (see Patton, 1980; Fry et al.? 1981 ) 
and it constitutes an approach which apparently has not been much* used 
for the evaluation of bHingual education programs* Although my work in 
the Southern Sudan attempted .to use such an approach, it must be 
admitted that the relat Ively .short amount of time devoted to class 
observations (ranging from about one to three hours for each classroom 
included in the evaluation) did not permit \he type of rich, 
ethnographic data collection and interpretation which is characteristic 
of qualitative evaluation methods. Unfortunately, while such a combined 
approach might look appealing on paper, there are probably very few 
educational ^valuators working today who have the necessary expertise to 
carry out suc^i an evaluation, particularly of bilingual education 
programs* It also seems that an evaluation of a bilingual education 
program which attempts to combine aspects of both quantitative and 
qualitative evaluation approaches would havte to be very carefully 
designed if it were to capitalize on the strengths of .both* approaches 
and not suffer from their combined weaknesses. 

Whiles it Is beyond the scope of this paper to go into details of 
how these two approaches can be combined for evaluating bilingual 
education pograms (see Fry et al., 1981 and Patton, 1980 for suggestions 
on how this can be done for general evaluation and social science 
research^, it is hoped that this paper has provided a strong rationale 
for such an approach. If our evaluations of bilingual education are to 
-be-usefu-Mn~detecMng- Impact of -such programs ,-d.tssemlna ting — 
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Information on how such programs operate*, and providing an empirical 
base for improving bilingual educatibn, then It is obvious that 
currently used evaluation approachesr are not meeting these needs. New 

Q 

innovative evaluation approaches are needed. Fortunately, matiy 
jnnovations in educational evaluation [lave occurred over recent years, 
which have much to add to our work in.bi lingual education. It Is our 
challenge to make the most of them and to. continue to develop new 
evaluation techniqu.es appropriate for determining the most effective 
pregrams for educating language minority student in. the U.S. 
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